{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第12章 TensorFlow计算加速\n",
    "在前面的章节中介绍了使用TensorFlow实现各种深度学习的算法。**然而要将深度学习应用到实际问题中，一个非常大的问题在于训练深度学习模型需要的计算量太大**。比如要将第6章中介绍的Inception-v3模型在单机上训练到78%的正确率需要将近半年的时间（[来源](https://research.googleblog.com/2016/04/announcing-tensorflow-08-now-with.html)，2016.04），这样的训练速度是完全无法应用到实际生产中的。为了加速训练过程，本章将介绍如何通过TensorFlow利用GPU或／和分布式计算进行模型训练。\n",
    "\n",
    "- 首先，在12.1节中将介绍如何在TensorFlow中使用**单个GPU进行计算加速**，也将介绍生成TensorFlow会话（tf.Session）时的一些常用参数。通过这些参数可以使调试更加方便而且程序的可扩展性更好。\n",
    "- 然而，在很多情况下，单个GPU的加速效率无法满足训练大型深度学习模型的计算量需求，这时将需要利用更多的计算资源。为了同时利用多个GPU或者多台机器，12.2节中将介绍**训练深度学习模型的井行方式**。\n",
    "- 然后，12.3节将介绍如何**在一台机器的多个GPU上并行化地训练深度学习模型**。在这一节中也将给出具体的TensorFlow样例程序来使用多GPU训练模型，并比较并行化效率提升的比率。\n",
    "- 最后在12.4节中将介绍**分布式TensorFlow**，以及如何通过分布式TensorFlow训练深度学习模型。在这一节中将给出具体的TensorFlow样例程序来实现不同的分布式深度学习训练模式。\n",
    "\n",
    "## 12.1 TensorFlow使用GPU\n",
    "**TensorFlow程序可以通过`tf.device`函数来指定运行每一个操作的设备，这个设备可以是本地的CPU或者GPU，也可以是某一台远程的服务器。**但在本节中只关心本地的设备。TensorFlow会给每一个可用的设备一个名称，`tf.device`函数可以通过设备的名称来指定执行运算的设备。比如CPU在TensorFlow中的名称为/cpu:0。在默认情况下：\n",
    "- 即使机器有多个CPU, TensorFlow也不会区分它们，所有的CPU都使用/cpu:0作为名称。\n",
    "- 而一台机器上不同GPU的名称是不同的，第n个GPU在TensorFlow中的名称为/gpu:n，比如第一个GPU的名称为/gpu:0，第二个为/gpu:1，以此类推。\n",
    "\n",
    "**1. 查看运算的设备**\n",
    "\n",
    "TensorFlow提供了一个快捷的方式来查看运行每一个运算的设备。在生成会话时，可以通过设置log_device_placement参数来打印运行每一个运算的设备。以下程序展示了如何使用log_device_placement这个参数："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2. 4. 6.]\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "\n",
    "a = tf.constant([1.0, 2.0, 3.0], shape=[3], name='a')\n",
    "b = tf.constant([1.0, 2.0, 3.0], shape=[3], name='b')\n",
    "c = a + b\n",
    "\n",
    "# 通过log_device_placement参数来记录运行每一个运算的设备。\n",
    "sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))\n",
    "print(sess.run(c))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*注意上述代码必须在命令行运行(文件见本文件同目录下a.py)，才会看到如下图的具体运行设备详细内容，参考[here](https://stackoverflow.com/questions/39677168/tensorflow-documentations-example-code-on-logging-device-placement-doesnt-pr)，下同。*\n",
    "<p align='center'>\n",
    "    <img src=images/output1.JPG>\n",
    "    <center>图12-1 output1</center>\n",
    "</p>\n",
    "\n",
    "在以上代码中，TensorFlow程序生成会话时加入了参数log_device placement=True，所以程序会将运行每一个操作的设备输出到屏幕。于是除了可以看到最后的计算结果，还可以看到类似“add: (Add)/job:localhost/replica:0/task:0/device:CPU:0”这样的输出。这些输出显示了执行每一个运算的设备。比如加法操作add是通过CPU来运行的，因为它的设备名称中包含了CPU:0。\n",
    "\n",
    "**在配置好GPU环境的TensorFlow中，如果操作没有明确地指定运行设备，那么TensorFlow会优先选择GPU**。比如将以上代码在亚马逊（Amazon Web Services, AWS）的g2.8xlarge实例上运行时，会得到类似以下的运行结果:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "Device mapping:\n",
    "/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus\n",
    "id: 0000:00:03.0\n",
    "/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GRID K520, pci bus\n",
    "id: 0000:00:04.0\n",
    "/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: GRID K520, pci bus\n",
    "id :0000:00:05.0\n",
    "/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: GRID K520, pci bus\n",
    "id :0000:00:06.0\n",
    "\n",
    "add : (Add): /job:localhost/replica:0/task:0/gpu:0\n",
    "b : (Const): /job:localhost/replica:0/task:0/gpu:0\n",
    "a : (Const): /job:localhost/replica:0/task:0/gpu:0\n",
    "[ 2 . 4 . 6.]\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "以上输出中，尽管g2.8xlarge实例有4个GPU，TensorFlow只会默认将运算优先放到/gpu:0上。如上所示。\n",
    "\n",
    "**2. 指定运算的设备**\n",
    "\n",
    "**如果需要将某些运算放到不同的GPU或者CPU上，就需要通过`tf.device`来手工指定**。以下程序给出了一个手工指定运行设备的样例："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2. 4. 6.]\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "\n",
    "with tf.device('/cpu:0'):\n",
    "    a = tf.constant([1.0, 2.0, 3.0], shape=[3], name='a')\n",
    "    b = tf.constant([1.0, 2.0, 3.0], shape=[3], name='b')\n",
    "with tf.device('/gpu:1'):\n",
    "    c = a + b\n",
    "    \n",
    "with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:\n",
    "    print(sess.run(c))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "以上代码在在AWS g2.8xlarge实例上运行上述代码可以得到以下结果："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "Device mapping:\n",
    "/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus\n",
    "id: 0000:00:03.0\n",
    "/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GRID K520, pci bus\n",
    "id: 0000:00:04.0\n",
    "/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: GRID K520, pci bus\n",
    "id :0000:00:05.0\n",
    "/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: GRID K520, pci bus\n",
    "id :0000:00:06.0\n",
    "\n",
    "add : (Add): /job:localhost/replica:0/task:0/gpu:1\n",
    "b : (Const): /job:localhost/replica:0/task:0/cpu:0\n",
    "a : (Const): /job:localhost/replica:0/task:0/cpu:0\n",
    "[ 2 . 4 . 6.]\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在本人笔记本上运行（b.py），得到结果如下（由于只有一个gpu，指定为:/gpu:0，其余相同）：\n",
    "<p align='center'>\n",
    "    <img src=images/output2.JPG>\n",
    "    <center>图12-2 output2</center>\n",
    "</p>\n",
    "\n",
    "在以上代码中可以看到生成常量a和b的操作被加载到了CPU上，而加法操作被放到了第二个GPU “/gpu:1”上。\n",
    "\n",
    "**3. GPU释放到CPU**\n",
    "\n",
    "**但是在TensorFlow中，不是所有的操作都可以被放在GPU上，如果强行将无法放在GPU上的操作指定到GPU上，那么程序将会报错。**以下代码给出了一个报错的样例："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "ename": "InvalidArgumentError",
     "evalue": "Cannot assign a device for operation 'a_gpu': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.\nColocation Debug Info:\nColocation group had the following types and devices: \nVariableV2: CPU \nAssign: CPU \nIdentity: GPU CPU \n\nColocation members and user-requested devices:\n  a_gpu (VariableV2) /device:GPU:0\n  a_gpu/Assign (Assign) /device:GPU:0\n  a_gpu/read (Identity) /device:GPU:0\n\nRegistered kernels:\n  device='CPU'\n  device='GPU'; dtype in [DT_HALF]\n  device='GPU'; dtype in [DT_FLOAT]\n  device='GPU'; dtype in [DT_DOUBLE]\n  device='GPU'; dtype in [DT_INT64]\n\n\t [[{{node a_gpu}} = VariableV2[container=\"\", dtype=DT_INT32, shape=[], shared_name=\"\", _device=\"/device:GPU:0\"]()]]\n\nCaused by op 'a_gpu', defined at:\n  File \"d:\\python3\\Lib\\runpy.py\", line 193, in _run_module_as_main\n    \"__main__\", mod_spec)\n  File \"d:\\python3\\Lib\\runpy.py\", line 85, in _run_code\n    exec(code, run_globals)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel_launcher.py\", line 16, in <module>\n    app.launch_new_instance()\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\traitlets\\config\\application.py\", line 658, in launch_instance\n    app.start()\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel\\kernelapp.py\", line 486, in start\n    self.io_loop.start()\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tornado\\platform\\asyncio.py\", line 112, in start\n    self.asyncio_loop.run_forever()\n  File \"d:\\python3\\Lib\\asyncio\\base_events.py\", line 421, in run_forever\n    self._run_once()\n  File \"d:\\python3\\Lib\\asyncio\\base_events.py\", line 1431, in _run_once\n    handle._run()\n  File \"d:\\python3\\Lib\\asyncio\\events.py\", line 145, in _run\n    self._callback(*self._args)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tornado\\platform\\asyncio.py\", line 102, in _handle_events\n    handler_func(fileobj, events)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tornado\\stack_context.py\", line 276, in null_wrapper\n    return fn(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\zmq\\eventloop\\zmqstream.py\", line 450, in _handle_events\n    self._handle_recv()\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\zmq\\eventloop\\zmqstream.py\", line 480, in _handle_recv\n    self._run_callback(callback, msg)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\zmq\\eventloop\\zmqstream.py\", line 432, in _run_callback\n    callback(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tornado\\stack_context.py\", line 276, in null_wrapper\n    return fn(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel\\kernelbase.py\", line 283, in dispatcher\n    return self.dispatch_shell(stream, msg)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel\\kernelbase.py\", line 233, in dispatch_shell\n    handler(stream, idents, msg)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel\\kernelbase.py\", line 399, in execute_request\n    user_expressions, allow_stdin)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel\\ipkernel.py\", line 208, in do_execute\n    res = shell.run_cell(code, store_history=store_history, silent=silent)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel\\zmqshell.py\", line 537, in run_cell\n    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\IPython\\core\\interactiveshell.py\", line 2728, in run_cell\n    interactivity=interactivity, compiler=compiler, result=result)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\IPython\\core\\interactiveshell.py\", line 2850, in run_ast_nodes\n    if self.run_code(code, result):\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\IPython\\core\\interactiveshell.py\", line 2910, in run_code\n    exec(code_obj, self.user_global_ns, self.user_ns)\n  File \"<ipython-input-1-be7e268e5269>\", line 8, in <module>\n    a_gpu = tf.Variable(0, name=\"a_gpu\")\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variables.py\", line 145, in __call__\n    return cls._variable_call(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variables.py\", line 141, in _variable_call\n    aggregation=aggregation)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variables.py\", line 120, in <lambda>\n    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variable_scope.py\", line 2441, in default_variable_creator\n    expected_shape=expected_shape, import_scope=import_scope)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variables.py\", line 147, in __call__\n    return super(VariableMetaclass, cls).__call__(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variables.py\", line 1104, in __init__\n    constraint=constraint)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variables.py\", line 1240, in _init_from_args\n    name=name)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\state_ops.py\", line 77, in variable_op_v2\n    shared_name=shared_name)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\gen_state_ops.py\", line 1731, in variable_v2\n    shared_name=shared_name, name=name)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\framework\\op_def_library.py\", line 787, in _apply_op_helper\n    op_def=op_def)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\util\\deprecation.py\", line 488, in new_func\n    return func(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\framework\\ops.py\", line 3272, in create_op\n    op_def=op_def)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\framework\\ops.py\", line 1768, in __init__\n    self._traceback = tf_stack.extract_stack()\n\nInvalidArgumentError (see above for traceback): Cannot assign a device for operation 'a_gpu': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.\nColocation Debug Info:\nColocation group had the following types and devices: \nVariableV2: CPU \nAssign: CPU \nIdentity: GPU CPU \n\nColocation members and user-requested devices:\n  a_gpu (VariableV2) /device:GPU:0\n  a_gpu/Assign (Assign) /device:GPU:0\n  a_gpu/read (Identity) /device:GPU:0\n\nRegistered kernels:\n  device='CPU'\n  device='GPU'; dtype in [DT_HALF]\n  device='GPU'; dtype in [DT_FLOAT]\n  device='GPU'; dtype in [DT_DOUBLE]\n  device='GPU'; dtype in [DT_INT64]\n\n\t [[{{node a_gpu}} = VariableV2[container=\"\", dtype=DT_INT32, shape=[], shared_name=\"\", _device=\"/device:GPU:0\"]()]]\n",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mInvalidArgumentError\u001b[0m                      Traceback (most recent call last)",
      "\u001b[1;32md:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\client\\session.py\u001b[0m in \u001b[0;36m_do_call\u001b[1;34m(self, fn, *args)\u001b[0m\n\u001b[0;32m   1291\u001b[0m     \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1292\u001b[1;33m       \u001b[1;32mreturn\u001b[0m \u001b[0mfn\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m*\u001b[0m\u001b[0margs\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1293\u001b[0m     \u001b[1;32mexcept\u001b[0m \u001b[0merrors\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mOpError\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32md:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\client\\session.py\u001b[0m in \u001b[0;36m_run_fn\u001b[1;34m(feed_dict, fetch_list, target_list, options, run_metadata)\u001b[0m\n\u001b[0;32m   1274\u001b[0m       \u001b[1;31m# Ensure any changes to the graph are reflected in the runtime.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1275\u001b[1;33m       \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_extend_graph\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1276\u001b[0m       return self._call_tf_sessionrun(\n",
      "\u001b[1;32md:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\client\\session.py\u001b[0m in \u001b[0;36m_extend_graph\u001b[1;34m(self)\u001b[0m\n\u001b[0;32m   1311\u001b[0m     \u001b[1;32mwith\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_graph\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_session_run_lock\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m  \u001b[1;31m# pylint: disable=protected-access\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1312\u001b[1;33m       \u001b[0mtf_session\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mExtendSession\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_session\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1313\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mInvalidArgumentError\u001b[0m: Cannot assign a device for operation 'a_gpu': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.\nColocation Debug Info:\nColocation group had the following types and devices: \nVariableV2: CPU \nAssign: CPU \nIdentity: GPU CPU \n\nColocation members and user-requested devices:\n  a_gpu (VariableV2) /device:GPU:0\n  a_gpu/Assign (Assign) /device:GPU:0\n  a_gpu/read (Identity) /device:GPU:0\n\nRegistered kernels:\n  device='CPU'\n  device='GPU'; dtype in [DT_HALF]\n  device='GPU'; dtype in [DT_FLOAT]\n  device='GPU'; dtype in [DT_DOUBLE]\n  device='GPU'; dtype in [DT_INT64]\n\n\t [[{{node a_gpu}} = VariableV2[container=\"\", dtype=DT_INT32, shape=[], shared_name=\"\", _device=\"/device:GPU:0\"]()]]",
      "\nDuring handling of the above exception, another exception occurred:\n",
      "\u001b[1;31mInvalidArgumentError\u001b[0m                      Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-1-be7e268e5269>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      9\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     10\u001b[0m \u001b[1;32mwith\u001b[0m \u001b[0mtf\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mSession\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mconfig\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mtf\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mConfigProto\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlog_device_placement\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mTrue\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0msess\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 11\u001b[1;33m     \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0msess\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mtf\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mglobal_variables_initializer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;32md:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\client\\session.py\u001b[0m in \u001b[0;36mrun\u001b[1;34m(self, fetches, feed_dict, options, run_metadata)\u001b[0m\n\u001b[0;32m    885\u001b[0m     \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    886\u001b[0m       result = self._run(None, fetches, feed_dict, options_ptr,\n\u001b[1;32m--> 887\u001b[1;33m                          run_metadata_ptr)\n\u001b[0m\u001b[0;32m    888\u001b[0m       \u001b[1;32mif\u001b[0m \u001b[0mrun_metadata\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    889\u001b[0m         \u001b[0mproto_data\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mtf_session\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mTF_GetBuffer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrun_metadata_ptr\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32md:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\client\\session.py\u001b[0m in \u001b[0;36m_run\u001b[1;34m(self, handle, fetches, feed_dict, options, run_metadata)\u001b[0m\n\u001b[0;32m   1108\u001b[0m     \u001b[1;32mif\u001b[0m \u001b[0mfinal_fetches\u001b[0m \u001b[1;32mor\u001b[0m \u001b[0mfinal_targets\u001b[0m \u001b[1;32mor\u001b[0m \u001b[1;33m(\u001b[0m\u001b[0mhandle\u001b[0m \u001b[1;32mand\u001b[0m \u001b[0mfeed_dict_tensor\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1109\u001b[0m       results = self._do_run(handle, final_targets, final_fetches,\n\u001b[1;32m-> 1110\u001b[1;33m                              feed_dict_tensor, options, run_metadata)\n\u001b[0m\u001b[0;32m   1111\u001b[0m     \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1112\u001b[0m       \u001b[0mresults\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32md:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\client\\session.py\u001b[0m in \u001b[0;36m_do_run\u001b[1;34m(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)\u001b[0m\n\u001b[0;32m   1284\u001b[0m     \u001b[1;32mif\u001b[0m \u001b[0mhandle\u001b[0m \u001b[1;32mis\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1285\u001b[0m       return self._do_call(_run_fn, feeds, fetches, targets, options,\n\u001b[1;32m-> 1286\u001b[1;33m                            run_metadata)\n\u001b[0m\u001b[0;32m   1287\u001b[0m     \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1288\u001b[0m       \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_do_call\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0m_prun_fn\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mhandle\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mfeeds\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mfetches\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32md:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\client\\session.py\u001b[0m in \u001b[0;36m_do_call\u001b[1;34m(self, fn, *args)\u001b[0m\n\u001b[0;32m   1306\u001b[0m           self._config.experimental.client_handles_error_formatting):\n\u001b[0;32m   1307\u001b[0m         \u001b[0mmessage\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0merror_interpolation\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0minterpolate\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmessage\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_graph\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1308\u001b[1;33m       \u001b[1;32mraise\u001b[0m \u001b[0mtype\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0me\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnode_def\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mop\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmessage\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1309\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1310\u001b[0m   \u001b[1;32mdef\u001b[0m \u001b[0m_extend_graph\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mInvalidArgumentError\u001b[0m: Cannot assign a device for operation 'a_gpu': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.\nColocation Debug Info:\nColocation group had the following types and devices: \nVariableV2: CPU \nAssign: CPU \nIdentity: GPU CPU \n\nColocation members and user-requested devices:\n  a_gpu (VariableV2) /device:GPU:0\n  a_gpu/Assign (Assign) /device:GPU:0\n  a_gpu/read (Identity) /device:GPU:0\n\nRegistered kernels:\n  device='CPU'\n  device='GPU'; dtype in [DT_HALF]\n  device='GPU'; dtype in [DT_FLOAT]\n  device='GPU'; dtype in [DT_DOUBLE]\n  device='GPU'; dtype in [DT_INT64]\n\n\t [[{{node a_gpu}} = VariableV2[container=\"\", dtype=DT_INT32, shape=[], shared_name=\"\", _device=\"/device:GPU:0\"]()]]\n\nCaused by op 'a_gpu', defined at:\n  File \"d:\\python3\\Lib\\runpy.py\", line 193, in _run_module_as_main\n    \"__main__\", mod_spec)\n  File \"d:\\python3\\Lib\\runpy.py\", line 85, in _run_code\n    exec(code, run_globals)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel_launcher.py\", line 16, in <module>\n    app.launch_new_instance()\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\traitlets\\config\\application.py\", line 658, in launch_instance\n    app.start()\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel\\kernelapp.py\", line 486, in start\n    self.io_loop.start()\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tornado\\platform\\asyncio.py\", line 112, in start\n    self.asyncio_loop.run_forever()\n  File \"d:\\python3\\Lib\\asyncio\\base_events.py\", line 421, in run_forever\n    self._run_once()\n  File \"d:\\python3\\Lib\\asyncio\\base_events.py\", line 1431, in _run_once\n    handle._run()\n  File \"d:\\python3\\Lib\\asyncio\\events.py\", line 145, in _run\n    self._callback(*self._args)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tornado\\platform\\asyncio.py\", line 102, in _handle_events\n    handler_func(fileobj, events)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tornado\\stack_context.py\", line 276, in null_wrapper\n    return fn(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\zmq\\eventloop\\zmqstream.py\", line 450, in _handle_events\n    self._handle_recv()\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\zmq\\eventloop\\zmqstream.py\", line 480, in _handle_recv\n    self._run_callback(callback, msg)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\zmq\\eventloop\\zmqstream.py\", line 432, in _run_callback\n    callback(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tornado\\stack_context.py\", line 276, in null_wrapper\n    return fn(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel\\kernelbase.py\", line 283, in dispatcher\n    return self.dispatch_shell(stream, msg)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel\\kernelbase.py\", line 233, in dispatch_shell\n    handler(stream, idents, msg)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel\\kernelbase.py\", line 399, in execute_request\n    user_expressions, allow_stdin)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel\\ipkernel.py\", line 208, in do_execute\n    res = shell.run_cell(code, store_history=store_history, silent=silent)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\ipykernel\\zmqshell.py\", line 537, in run_cell\n    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\IPython\\core\\interactiveshell.py\", line 2728, in run_cell\n    interactivity=interactivity, compiler=compiler, result=result)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\IPython\\core\\interactiveshell.py\", line 2850, in run_ast_nodes\n    if self.run_code(code, result):\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\IPython\\core\\interactiveshell.py\", line 2910, in run_code\n    exec(code_obj, self.user_global_ns, self.user_ns)\n  File \"<ipython-input-1-be7e268e5269>\", line 8, in <module>\n    a_gpu = tf.Variable(0, name=\"a_gpu\")\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variables.py\", line 145, in __call__\n    return cls._variable_call(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variables.py\", line 141, in _variable_call\n    aggregation=aggregation)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variables.py\", line 120, in <lambda>\n    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variable_scope.py\", line 2441, in default_variable_creator\n    expected_shape=expected_shape, import_scope=import_scope)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variables.py\", line 147, in __call__\n    return super(VariableMetaclass, cls).__call__(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variables.py\", line 1104, in __init__\n    constraint=constraint)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\variables.py\", line 1240, in _init_from_args\n    name=name)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\state_ops.py\", line 77, in variable_op_v2\n    shared_name=shared_name)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\ops\\gen_state_ops.py\", line 1731, in variable_v2\n    shared_name=shared_name, name=name)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\framework\\op_def_library.py\", line 787, in _apply_op_helper\n    op_def=op_def)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\util\\deprecation.py\", line 488, in new_func\n    return func(*args, **kwargs)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\framework\\ops.py\", line 3272, in create_op\n    op_def=op_def)\n  File \"d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\python\\framework\\ops.py\", line 1768, in __init__\n    self._traceback = tf_stack.extract_stack()\n\nInvalidArgumentError (see above for traceback): Cannot assign a device for operation 'a_gpu': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.\nColocation Debug Info:\nColocation group had the following types and devices: \nVariableV2: CPU \nAssign: CPU \nIdentity: GPU CPU \n\nColocation members and user-requested devices:\n  a_gpu (VariableV2) /device:GPU:0\n  a_gpu/Assign (Assign) /device:GPU:0\n  a_gpu/read (Identity) /device:GPU:0\n\nRegistered kernels:\n  device='CPU'\n  device='GPU'; dtype in [DT_HALF]\n  device='GPU'; dtype in [DT_FLOAT]\n  device='GPU'; dtype in [DT_DOUBLE]\n  device='GPU'; dtype in [DT_INT64]\n\n\t [[{{node a_gpu}} = VariableV2[container=\"\", dtype=DT_INT32, shape=[], shared_name=\"\", _device=\"/device:GPU:0\"]()]]\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "\n",
    "# 在CPU上运行tf.Variable\n",
    "a_cpu = tf.Variable(0, name=\"a_cpu\")\n",
    "\n",
    "with tf.device('/gpu:0'):\n",
    "    # 将tf.Variable强制放在GPU上\n",
    "    a_gpu = tf.Variable(0, name=\"a_gpu\")\n",
    "    \n",
    "with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:\n",
    "    print(sess.run(tf.global_variables_initializer()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "不同版本的TensorFlow对GPU的支持不一样，如果程序中全部使用强制指定设备的方式会降低程序的可移植性。在TensorFlow的[kernel](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/kernels)中定义了哪些操作可以跑在GPU上。比如可以在[variable_ops.cc](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/resource_variable_ops.cc#L469)程序中找到以下定义:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "#define REGISTER_GPU_KERNELS(type)                                       \\\n",
    "  REGISTER_KERNEL_BUILDER(Name(\"AssignAddVariableOp\")                    \\\n",
    "                              .Device(DEVICE_GPU)                        \\\n",
    "                              .HostMemory(\"resource\")                    \\\n",
    "                              .TypeConstraint<type>(\"dtype\"),            \\\n",
    "                          AssignUpdateVariableOp<GPUDevice, type, ADD>); \\\n",
    "  REGISTER_KERNEL_BUILDER(Name(\"AssignSubVariableOp\")                    \\\n",
    "                              .Device(DEVICE_GPU)                        \\\n",
    "                              .HostMemory(\"resource\")                    \\\n",
    "                              .TypeConstraint<type>(\"dtype\"),            \\\n",
    "                          AssignUpdateVariableOp<GPUDevice, type, SUB>);\n",
    "\n",
    "TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNELS);\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*注意：书上原本的代码，tensorflow已经更新为如上个cell所示，这里或许有些变动。*\n",
    "\n",
    "**在这段定义中可以看到GPU只在部分数据类型上支持tf.Variable操作**。如果在TensorFlow代码库中搜索调用这段代码的宏TF_CALL_GPU_NUMBER_TYPES，可以发现在GPU上，tf.Variable操作只支持实数型（floatl6、float32和double）的参数。而在报错的样例代码中给定的参数是整数型的，所以不支持在GPU上运行。**为避免这个问题，TensorFlow在生成会话时可以指定[allow_soft_placement](https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/core/protobuf/config.proto#L357)参数，当设置为True时，如果运算无法由GPU执行，那么TensorFlow会自动将它放到CPU上执行**。以下代码给出了一个使用allow_soft_placement参数的样例："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "\n",
    "# 在CPU上运行tf.Variable\n",
    "a_cpu = tf.Variable(0, name=\"a_cpu\")\n",
    "\n",
    "with tf.device('/gpu:0'):\n",
    "    # 将tf.Variable强制放在GPU上\n",
    "    a_gpu = tf.Variable(0, name=\"a_gpu\")\n",
    "    \n",
    "# 通过allow_soft_placement参数自动将无法放在GPU上的操作放回CPU\n",
    "sess = tf.Session(config=tf.ConfigProto(\n",
    "    allow_soft_placement=True, log_device_placement=True))\n",
    "sess.run(tf.global_variables_initializer())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "同样在AWS g2.8xlarge上可以得到如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "Device mapping:\n",
    "/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus\n",
    "id: 0000:00:03.0\n",
    "/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GRID K520, pci bus\n",
    "id: 0000:00:04.0\n",
    "/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: GRID K520, pci bus\n",
    "id :0000:00:05.0\n",
    "/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: GRID K520, pci bus\n",
    "id :0000:00:06.0\n",
    "a_gpu: /job:localhost/replica:0/task:0/cpu:0\n",
    "a_gpu/read: /job:localhost/replica:O/task:0/cpu:0\n",
    "a_gpu/Assign: /job:localhost/replica:O/task:0/cpu:0\n",
    "init/NoOp_1: /job:localhost/replica:0/task:0/gpu:0\n",
    "a_cpu: /job:localhost/replica:0/task:0/cpu:0\n",
    "a_cpu/read: /job:localhost/replica:0/task:0/cpu:0\n",
    "a_cpu/Assign: /job:localhost/replica:0/task:0/cpu:0\n",
    "init/NoOp: /job:localhost/replica:0/task:0/gpu:0\n",
    "init: /job :localhost/replica:0/task:0/gpu:0\n",
    "a_gpu/initial_value: /job:localhost/replica:0/task:0/gpu:0\n",
    "a_cpu/initial_value: /job:localhost/replica:0/task:0/cpu:0\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从输出的日志中可以看到在生成变量a_gpu时，无法放到GPU上的运算被自动调整到了CPU上（比如a_gpu和a_gpu/read），而可以被GPU执行的命令（比a_gpu/initial_value）依旧由GPU执行。在本人电脑下运行（c.py），输出如下：\n",
    "<p align='center'>\n",
    "    <img src=images/output3.JPG>\n",
    "    <center>图12-3 output3</center>\n",
    "</p>\n",
    "\n",
    "**4. 使用部分GPU**\n",
    "\n",
    "虽然GPU可以加速TensorFlow的计算，但一般来说不会把所有的操作全部放在GPU上。**一个比较好的实践是将计算密集型的运算放在GPU上，而把其他操作放到CPU上。GPU是机器中相对独立的资源，将计算放入或者转出GPU都需要额外的时间。而且GPU需要将计算时用到的数据从内存复制到GPU设备上，这也需要额外的时间。TensorFlow可以自动完成这些操作而不需要用户特别处理，但为了提高程序运行的速度，用户也需要尽量将相关的运算放在同一个设备上。**\n",
    "\n",
    "**TensorFlow默认会占用设备上的所有GPU以及每个GPU的所有显存。**\n",
    "\n",
    "**使用部分个数的GPU：**如果在一个TensorFlow程序中只需要使用部分GPU，可以通过设置CUDA_VISIBLE_DEVICES环境变量来控制。有三种设置方式：\n",
    "- 在运行时设置这个环境变量："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "＃只使用第二块GPU(GPU编号从0开始）。在demo_code.py中，机器上的第二块GPU的\n",
    "＃名称变成/gpu:0，不过在运行时所有/gpu:0的运算将被放在第二块GPU上。\n",
    "CUDA_VISIBLE_DEVICES=1 python demo_code.py\n",
    "\n",
    "＃只使用第一块和第二块GPU\n",
    "CUDA_VISIBLE_DEVICES=0,1 python demo_code.py\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 在程序中设置环境变量："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "import os\n",
    "\n",
    "＃只使用第三块GPU\n",
    "os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"2\"\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- GPUOptions分配，[visible_device_list](https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/core/protobuf/config.proto#L55):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "config = tf.ConfigProto()\n",
    "config.visible_device_list = '0'\n",
    "session = tf.Session(config=config, ... )\n",
    "\n",
    "# 或：\n",
    "gpu_options = tf.GPUOptions(visible_device_list='0')\n",
    "sess =  tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**使用GPU的部分显存：**TensorFlow也支持动态分配GPU的显存，使得一块GPU上可以同时运行多个任务。下面给出了TensorFlow动态分配显存的两种方法：\n",
    "- 按需分配，[allow_growth](https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/core/protobuf/config.proto#L36)："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "config = tf.ConfigProto()\n",
    "config.gpu_options.allow_growth = True\n",
    "session = tf.Session(config=config, ... )\n",
    "\n",
    "# 或：\n",
    "gpu_options = tf.GPUOptions(allow_growth=True)\n",
    "sess =  tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 按固定比例分配，[per_process_gpu_memory_fraction](https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/core/protobuf/config.proto#L23)："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "config = tf.ConfigProto()\n",
    "config.gpu_options.per_process_gpu_memory_fraction = 0.4        # 直接分配40%显存\n",
    "session = tf.Session(config=config, ... )\n",
    "\n",
    "# 或：\n",
    "gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)   \n",
    "sess =  tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "更多关于GPUOptions可以参考[tensorflow源代码](https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/core/protobuf/config.proto#L16)。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
